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We present a coarse-grained lattice model of solvation thermodynamics and the hydrophobic effect that 
implements the ideas of Lum- Chandler- Weeks (LCW) theory [J. Phys. Chem. B 103, 4570 (1999)] and 
improves upon previous lattice models based on it. Through comparison with molecular simulation, we show 
that our model captures the length-scale and curvature dependence of solvation free energies with near- 
quantitative accuracy and two to three orders of magnitude less computational effort, and further, correctly 
describes the large but rare solvent fluctuations that are involved in dewetting, vapor tube formation and 
hydrophobic assembly. Our model is intermediate in detail and complexity between implicit-solvent models 
and explicit-water simulations. 



I. INTRODUCTION 

This is a technical paper that addresses how the hy- 
drophobic effect may be understood quantitatively. De- 
spite its technical nature, the physical ideas and final 
model we formulate should be accessible and potentially 
interesting to a wide audience of researchers who are con- 
fronted with the many manifestations of the hydrophobic 
effect, and are in need of an effective quantitative tool for 
treating them. 

The hydrophobic effect is presumed to be an important 
driving force in biology and nanoscale self-assembly. 1-3 
Because of its collective nature and its length-scale 
dependence, 4 and because of its nonlocal dependence on 
solute surface moeities, 5-7 modeling the hydrophobic ef- 
fect remains a challenge. To treat it theoretically, one 
could track the explicit position of possibly tens of thou- 
sands of water molecules around solutes of interest, 8,9 
but the computational cost of this approach limits its 
applications. Alternatively, at significantly reduced cost, 
one could replace explicit waters by an implicit solvent 
model, as is done in the generalized Born and surface area 
(GBSA) approach. 10 ' 11 In this paper, building on previ- 
ous efforts, 12-14 we propose a coarse-grained model inter- 
mediate in detail between these two extremes, one that 
retains most of the computational advantage of implicit 
solvent models and overcomes two of their significant con- 
ceptual flaws: their incorrect scaling behavior and their 
neglect of rare but large solvent density fluctuations that 
play pivotal roles in the dynamics of assembly. 

Solvation free energies 15,16 of solutes with sub- 
nanometer features, exactly the size prevalent in bi- 
ological regimes, do not in general 17 scale as surface 
area. 12,19 ' 20 By construction, models that assume such 
scaling significantly underestimate the driving force for 
hydrophobic assembly. ,21 Our model, on the other hand, 
captures the correct scaling behavior with solute size for 
generic solute geometries. 

Since hydrophobicity is a solvent property as much as 
it is a solute property, it is important to consider the sol- 
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vent on length scales dictated by the solute(s). Numer- 
ous studies of hydrophobicity 4, 14,22-32 have shown that 
rare solvent motions and dewetting transitions in con- 
fining environments play a critical role in solute assem- 
bly and function. Our model adequately captures these 
rare and important fluctuations. To demonstrate this, 
we consider the water number distribution Py(N) in a 
probe volume V. Hummer et al. introduced the idea of 
characterizing this distribution in the context of solva- 
tion theory, 33 and the utility of this function has been 
demonstrated subsequently. 34 

The GBSA model, 10 ' 11 widely used in biological set- 
tings, captures the effect of electrostatics with reasonable 
accuracy, but its treatment of the hydrophobic effect is 
less adequate, 20,37-39 for the reasons discussed above. In- 
teresting examples of solutes for which hydrophobicity is 
essential, and for which GBSA is unsuitable, include large 
classes of proteins, such as those involved in transmem- 
brane protein recognition and insertion, 40 and versatile 
chaperones. 41 It is these kinds of solutes for which our 
approach may eventually prove most useful. 

The ideas behind our model are those of Lum- 
Chandler- Weeks (LCW) 12 theory. Ten Wolde, Sun and 
Chandler 13 ' 14 generalized this theory by casting it in 
terms of a Hamiltonian for a lattice field theory. The 
motivation for that development was to facilitate treat- 
ments of large length scale dynamics. The motivation of 
the current work is similar, though in this paper we con- 
fine our attention to time-independent properties. The 
main contribution of this paper is to improve upon these 
previous attempts, and to introduce concrete implemen- 
tations of the underlying theory that illustrate the im- 
provements, which are significant. 

The paper is organized as follows. In Section II, we 
sketch the physical ideas behind our model and present 
their implementation in a tractable format. The deriva- 
tion and approximations made therein are left to the Ap- 
pendix. In Section III, we consider the accuracy of our 
model by computing the solvation free energies of solutes 
and Py(N) distributions for various geometries, with and 
without adhesive solute-solvent interactions. Finally, in 
Section IV we conclude with a discussion of the mer- 
its and limitations of the present implementation of the 
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model. 



II. MODEL 

A. Density fields and Hamiltonian 

In this section we first consider general features of a 
liquid solvent, specializing to water only later. We fo- 
cus on the solvent density, p(r). For water in particular, 
p(r) refers to the instantaneous positions of water oxygen 
atoms. Effects of other variables such as molecular ori- 
entations appear implicitly in terms of parameters. We 
decompose density into large and small length-scale con- 
tributions pen(r) and Sp(r), respectively, 



p(r) = p e n{r) + Sp(r). 



(1) 



Here, p e is the bulk liquid density, and n(r) is an Ising-like 
field that is 1 in regions that are locally liquid-like and 
in regions that are locally vapor-like. The field takes on 
intermediate values only around interfacial regions. This 
large length-scale field describes extended liquid-vapor 
interfaces, while the small length-scale field describes 
more rapidly-varying density fluctuations. This separa- 
tion implies some form of space-time coarse-graining to 
define n(r), a coarse-graining which is most reasonable 
for dense fluids far from their critical points. The key 
development of LCW was to describe how to (a) per- 
form this decomposition, and (b) couple the two separate 
fields. 

Building on the work of ten Woldc, Sun and 
Chandler, 13 we construct a Hamiltonian for the solvent 
density that captures the dominant physics. We have 

H[n(r),5p(r)]=H lalge [n(r)} 

+ H smM [Sp(r);n(r)] + H int [n(r),Sp(r)]. (2) 

The term H\ algc [n(r)] captures the physics of interface 
formation in n(r). For the term H sma n[Sp(r); n(r)], 
we exploit the observation that small length-scale den- 
sity fluctuations in homogeneous liquids obey Gaussian 
statistics. 33,42 ' 43 Thus, for a given configuration of n(r), 
we assume that Sp(r) has Gaussian statistics with vari- 
ance 

X(r,v';[n(r)}) = (Sp(r)5p(v')) n{r) . 

Here, the right-hand side denotes the thermal average of 
Sp(r)Sp(r') under the constraint of fixed n(r). The term 
H sina ,\i[5p(r);n(r)] is then a Gaussian with this variance, 
namely 



H sman [Sp(r);n(r)] = 
k B T 



8p(T) x - 1 (ry;[n(T)])6p(T'), (3) 



where k^T is temperature, T, times Boltzmann's con- 
stant. For conciseness, we use an abbreviated integration 



notation, where the integration variable is denoted with a 
subscript to the integral sign, and the integration domain 
is all of space unless otherwise stated. We approximate 
the variance with 



X(r,r'; [n(r)]) 



Xo(r-r'), n(r) = n(r') = 1; 



0, 



otherwise, 



(4) 



where xo( r ) can be written in terms of the radial distri- 
bution function g(r) as 



Xo(r)= W <5(r) + p £ 2 [ ff (|r|)-l]. 



(5) 



For the uses we make of the approximation in Equa- 
tion (4), corrections have quantitative but not qualitative 
effects, as discussed in the Appendix and also Ref. 44. 
Finally, i?i nt [n(r), Sp(r)] is an effective coupling between 
n(r) and 5p(r) due to unbalanced attractive forces in the 
solvent, whose details are given in the Appendix. 

In the absence of large solutes, fluctuations in n(r) are 
unlikely. The only fluctuations of significance in that 
case are those described by 6p(r). In the presence of 
large solutes, however, n(r) will often differ significantly 
from its bulk mean value. In that case the statistics of 
Sp(r) is modified and the coupling i?i nt [n(r), 8p(r)] be- 
tween n(r) and Sp(r) becomes significant. When Sp(r) 
is integrated out, a renormalizcd Hamiltonian for n(r) 
results. 

LCW theory 12 is a mean-field theory for the average 
large length-scale field, (n(r)), so it ignores the effects of 
large-scale fluctuations in n(r). Subsequent lattice im- 
plementations of LCW theory 13 ' 14,45 have incorporated 
fluctuations in the simplest possible manner. The present 
model refines these previous attempts to achieve near- 
quantitative accuracy for solvation free-energies and cor- 
rect behavior of fluctuations in n(r). Most importantly, 
we improve the calculation of the interfacial energies due 
to n(r). 

To write down the renormalizcd Hamiltonian, we be- 
gin by describing n(r) with reference to a cubic grid of 
spacing A, depicted in Figure 1, and we denote its value 
at the center of cell i by n». Then, n(r) is given by 



n ( r ) = ^"-i*(r - I-,), 



(6) 



where is the center of cell i and rij is 1 or 0, and the 
sum is over all cells i. The function ^(r) is maximal with 
value 1 at r = 0; it is cubic symmetric about the origin; 
and it is zero when the magnitude of any of the Carte- 
sian components of r is greater than A. The value of A, 
about which we will say more later, should be roughly 
the size of the bulk correlation length of the liquid sol- 
vent. The typical size of interfacial energies between cells 
on this grid is 7A 2 , where 7 is the liquid- vapor surface 
tension of the solvent. The dissolved solute excludes sol- 
vent density from a volume v, and we define v to be its 
complement, so that the total volume of the system is 
the union of v and v. The excluded volume can be of any 
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n(r) = 1 



n(r) = 




v, cell i 



FIG. 1. Schematic showing the solute and the large length- 
scale density field on a grid 



shape, and it can be composed of disconnected parts. 
In Ref. 42, regions within v are called "in", and regions 
within v are called "out" . For any volume V, we have 
the projector 6y( r )> 



Mr) 



1, reV, 
0, otherwise, 



(7) 



so that 6y(r) + by(r) = 1. We denote the overlap of 
v or v with cell i by Vi and Uj, respectively. 

In the absence of a solute, the Gaussian nature of 6p(r) 
results in solvent number fluctuation correlations. The 
correlations between the portions of cells i and j that 
overlap with two volumes V and V', respectively, form 
the elements of a matrix 



Xij(V,V) 



Mr)xo(r,r')Mr')- (8) 



Here, the domain of the r and r' integrals are restricted 
to the volume of cells i and j, as indicated. A way to 
estimate these elements is outlined in the Appendix. The 
resulting matrix is used to calculate entropic effects due 
to solvent exclusion and the linear response of solvent 
density to external fields. 

Part of the renormalized Hamiltonian is the free en- 
ergy ifi arge [n(r)] of the field n(r) in the absence of ex- 
ternal solutes. We estimate this contribution using a 
Landau-Ginzburg Hamiltonian 



large [ 



■Ml-/ 



w(n{r),n) + — |Vra(r)|' 



(9) 



where w{p/ pt, p) is the grand free energy density for the 
liquid solvent at a given density, p, and chemical po- 
tential, fi, relative to that of the gas. The parameter m 
reflects surface tension and intrinsic interfacial width. At 



liquid-gas phase coexistence, p, — 0, the value of the inte- 
gral is conveniently expressed as the sum 7A 2 ^TJ, hi with 
the local integrals 



hi= ^L dx L dy 



Zi+\ 



dz 



w(n(x, y, z), 0) + —\Vn(x, y, z)\ 2 \ . (10) 

The quantity hi depends only on the values of rij for 
cells j that share one of the corners of cell i. There 
are only 14 distinct possible values of hi, which can 
be precalculated numerically for a given free energy 
density w(n,0) and cell size A, as detailed in the Ap- 
pendix. In previous modeling, the simpler Ising model 
estimate 7 A 2 Yluj) ( n i ~ n j) 2 nas been used. For reasons 
discussed in the Appendix, this simpler estimate proves 
less accurate than the one used here. 

With the above notation, we now write the Hamilto- 
nian for our model, which constitutes the main result of 
the paper, 

-ffeff[{«*}] = 1>? ^2 hi ~ MPM 3 rii 

i i 

+ Kj~](/>i(-piniVi) 

i 

+ k B T[(N) 2 v /2<T v + C/2}, (11a) 



where 



<j>i = 2ap e ^ 
(N) v = pe^njVj, 

i 



1 1 w 

1 - -ni - — 2^ 



12 



j (nni) 



'■J 



c 



ln(27rcr„), 



(N) v > 1, 



max[ln(27ra„), (N) v ], otherwise. 



(lib) 

(11c) 
(lid) 

(lie) 



The field fa, with strength governed by the positive pa- 
rameter a, is what Weeks has termed an "unbalancing 
potential" . 46-50 The primed sum over j (nni) is a sum over 
the six cells j that are nearest neighbors to cell i. The fi- 
nal expression for fa shown above, with renormalization 
constant K, is the result of an accurate and computa- 
tionally convenient approximation, which is described in 
detail in the Appendix. 

The terms on the right-hand-side of Equation (11a) 
respectively approximate: the free-energy cost of estab- 
lishing interfaces in n(r), the pressure-induced bias to- 
wards the liquid phase, the effective coupling between 
n(r) and 6p(r) induced by the presence of a solute, and 
the entropic cost of excluding solvent density from the 
portions of v where n(r) = 1. As the total number of wa- 
ters to be excluded, (N) v , approaches zero, the statistics 
of solvent number fluctuations in v changes from Gaus- 
sian to Poisson, so that its variance, a v , also approaches 
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zero. Equation (lie) captures this change continuously 
and prevents -ff e ff[{^i}] from becoming infinitely negative 
in this limit. 

B. Incorporating Solute-solvent attractions 

The above Hamiltonian pertains to the simplest case, 
where the solute interacts with the solvent only by hard- 
core repulsion. Realistic solutes additionally have attrac- 
tive interactions with the solvent that can be modeled as 
an external potential u(r) that couples to p(r). Such a 
potential induces an additional term in our Hamiltonian, 
which we denote by i?„[{n,}]. To describe this term, we 
define a discretized analogue Ui of u(r), 

«i = ^ f Mr)u(r). (12) 

Notice that is independent of u(r) for values of r inside 
the solute. The apparent divergence, where v completely 
overlaps cell i, has no effect in the final expression. In 
particular, 

H»[{m}] = 

Xij 

{v,v)nj(N) v /a v 

i j 
3 

+ ~y~ ^2 P U i n iXi3 v)nj/3Uj , (13) 
i,3 

where ft is the reciprocal of k^T. The first part is the 
mean-field contribution J u(r)(p(r)), while the last term 
is the entropic cost of the external potential modifying 
the average solvent density in the vicinity of the solute. 

C. Parameters of the Hamiltonian 

We now specialize our model to water at ambient con- 
ditions, T = 298 K and 1 atm pressure, p. Further, we 
comment upon what changes are required for applica- 
tions at different states of water. 

The cell size length A should be no smaller than the 
intrinsic width of the liquid- vapor interface. Based upon 
the interfacial profile of the SPC/E model, 19,51 we there- 
fore pick A = 4 A. This is the minimal scale over which 
the time-averaged solvent density can transition from 
liquid-like to vapor-like values. Following Ref. 52, we 
use the free-energy density 

w ( n ^) = ^r( n - i) 2 " 2 - PPe n > ( 14 ) 

with d = 1.27 A because this choice reproduces the sig- 
moidal density profile of water- vapor interfaces at coex- 
istence. The resulting values of hi are tabulated in the 



Appendix. The bulk liquid density pg is the experimen- 
tal value, 53 whereby a liquid cell contains peX 3 « 2.13 
waters on average. The value of m is chosen such that 
the interfacial energy of vapor spheres of radius R tends 
to An^jR 2 for large R. At ambient conditions, the ex- 
perimental value for the surface tension 53 yields 7 A 2 w 
2.80 fceT. Finally, the relative chemical potential is given 
by p « (p — p vap )/p£, where p vap is the vapor pressure 
at 298 K. This relationship gives p « 7.16 x KT 4 fc B T, 
which is quite small, reflecting that water at ambient 
conditions is nearly at coexistence with its vapor. 

The matrix elements Xij(Vi V) are computed from the 
radial distribution function, g(r), and we derive this func- 
tion from Nartcn and Levy's tabulated data. 54 It is a 
convenient data set because it covers a broad range of 
temperatures for the liquid at and near p = 1 atm. At 
one temperature, 25°C, we have checked that a different 
estimate of the radial distribution function, that of the 
SPC/E model, yields similar matrix elements, and the 
resulting solvation properties are essentially identical to 
those obtained when the Xij(X V')'s are computed from 
the Narten-Levy data at the same temperature. 

The only parameters that we estimate through fitting 
are the strength a of the unbalancing potential and the 
renormalization constant K. In the absence of solute- 
solvent attractions, only the product of a and K is rel- 
evant. Values of a and K with Kapt = 2.1 k^T allow 
us to match the solvation free energies of hard spheres 
in SPC/E water (see below). By comparing the aver- 
age value of the computationally convenient approximate 
expression involving in Equation (11a) with that of 
its complete and unrenormalizcd counterpart, as is done 
in the Appendix, we find that K is about 1/2, so that 
api w A.2k^T. This value for a is close to the original 
LCW estimate, 12 arrived at from a different criterion. 

These values are applicable at ambient conditions. 
As temperature and pressure vary, only 7, p and g(r) 
vary appreciably, while K varies slightly. In partic- 
ular, surface tension decreases roughly linearly with 
temperature 53 (with d7/dT w — 0.15mJ/m 2 -K which is 
-5.8 x 1(T 3 k B T/\ 2 -K at T = 298 K). As noted above, p 
increases roughly linearly with pressure. The pair corre- 
lation function g{r) loses some structure for temperatures 
above 50 °C. The terms that are modeled by the renor- 
malization constant K reflect the degree to which solvent 
density layers next to a solute. Since this layering reflects 
the structure of g(r), we expect K to be slightly state- 
dependent, with its value increasing with temperature. 

Conversely, liquid water has a nearly constant den- 
sity and bulk correlation length at the temperatures and 
pressures where our model would be useful, so pi and A 
can be taken as constant as well. Theoretical estimates 
for a in simple liquids (Eq. 4 in Ref 46) are state- 
independent, so we expect that in water, a will be nearly 
state-independent as well. 55 
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III. APPLICATIONS AND RESULTS 
A. Solvation Free Energies 

To test our model's ability to capture the length-scale 
dependence of solvation, and to parametrize the strength 
of the unbalancing potential, we have calculated the 
solvation free energy of hard spheres of different radii. 
Whether within our model or using explicit water simu- 
lations, we calculate the solvation free energy of a solute 
following the guidelines of Ref. 56. Briefly, we first de- 
fine a series of M + 1 solutes Sq through Sm that slowly 
interpolate from an empty system (Sq) to the final so- 
lute of interest (Sm)- We then sequentially calculate the 
free energy difference between solute m and solute m + 1 
using the Bennett acceptance ratio estimator 57 (BAR), 
and, where necessary, the linear interpolation stratifica- 
tion procedure of Ref. 56. Error estimates are calculated 
using BAR, and are generally smaller than 0.5%. 

Our model (Equation (11a)) involves only simple arith- 
metic, so free energies can be calculated with little com- 
putational effort. For example, calculating the solvation 
free energy of hard spheres of up to 14 A in radius in 
increments of 0.5 A (Figure 2) takes about 1 hour on 
a single 2 GHz machine with a code that has not been 
fully optimized, whereas a similar calculation in explicit 
SPC/E waters with GROMACS 58 would take around 600 
hours on the same machine to obtain a similar statistical 
accuracy. 

Hard-sphere solvation free energies scale as solute vol- 
ume for small spheres, and as surface area for large 
spheres, with a smooth crossover at intermediate sizes. 4 
Figure 2 illustrates this behavior and compares the re- 
sults of our model to previous simulation results using 
SPC/E water. 19 As the model manifestly reproduces the 
small- and large-length scale limits, the most significant 
feature illustrated in Figure 2 is the gradual crossover 
from volume to surface area scaling. Ignoring the unbal- 
ancing potential leads to a qualitatively correct scaling 
behavior. However, adjustment of the single parameter a, 
which determines the strength of the unbalancing poten- 
tial, yields a near-exact agreement between our model 
and the SPC/E results for all sphere sizes. In all subse- 
quent results, the parameter a is fixed at this value. 

The model results have small lattice artifacts — results 
that depend upon the position of the solute relative to 
that of the coarse-grained lattice — as shown in the inset 
of Figure 2. When studying stationary solutes, lattice 
artifacts may be mitigated by performing multiple cal- 
culations, differing only by small displacements of the 
solutes, and then averaging the results. When studying 
dynamical phenomena, lattice artifacts tend to pin so- 
lutes into alignment with the coarse-grained lattice. For 
arbitrary molecular solutes, we expect that pinning forces 
acting on one portion of the molecule will generically op- 
pose pinning forces on other parts of the molecule, so that 
the total pinning forces will largely cancel out. However, 
when treating many identical molecules, lattice artifacts 




Sphere radius R (nm) 



FIG. 2. Solvation free energies G of hard spheres of increasing 
radii, as calculated from explicit SPC/E water simulations 19 
(solid blue), from the coarse-grained model (solid black), and 
from the most common GBSA variant 60 (arrow at bottom 
right). When the coarse-grained model has no unbalancing 
potential (a = 0, dashed gray), the intermediate-size regime 
is only qualitatively reproduced. For large spheres, the ratio 
of G to surface area tends to the liquid- vapor surface tension 7 
(horizontal red dots). Inset: Illustration of lattice artifacts. 
The spheres are centered at different offsets from the lattice: 
a generic position (0.98 A, 0.79 A, 1.89 A) that breaks all ro- 
tational and mirror symmetries (black), a lattice cell corner 
(blue) and a lattice cell center (red). All three curves are 
identical for R < 0.35 nm. 



can add constructively, and additional steps are needed 
to mitigate them. 59 

Since the unbalancing potential is explicitly 
parametrized with the solvation free energy of hard 
spheres, it is useful to evaluate the accuracy of the 
results in other geometries. To this effect we computed 
the solvation free energies of a family of hexagonal 
plates, consisting of 37 methane-like oily sites arranged 
into three concentric rings. We control the size of these 
plates, depicted in Figure 3, by varying the distance d 
between neighboring oily sites. For our calculations 
with explicit SPC/E water, the sites are uncharged and 
interact with the solvent molecules via a standard 62 
water-methane Lennard- Jones potential. To study the 
role of attractive interactions, we split this Lennard- 
Jones potential using the Weeks-Chandler- Andersen 
(WCA) prescription 64 into a repulsive part uo(r) and 
an attractive part Au(r). The magnitude of the at- 
tractive tail can be varied systematically with a scaling 
parameter 77, such that 

u v (r) — u (r) + r]Au(r). (15) 

For the ideal hydrophobic plate, we set 77 to zero. 

In the coarse-grained model, the repulsive core of the 
solute is represented as an excluded volume. To con- 
struct it, we replace each solute particle by a thermally- 



6 





20 r 




15 - 






I 

3 


i o 






< 


5 - 


zn 




<j 




co 


- 


6" 




si 






-5 - 




-in - 




1.5 2 2.5 3 3.5 4 4.5 5 
Solute Center Separation d (A) 

FIG. 3. Solvation free energies G of hexagonal plates, 
as a function of plate size, as calculated by the coarse- 
grained model (solid lines), by explicit SPC/E water sim- 
ulations (points), and by the most common GBSA variant 
(arrow on right). Three values of the attractive interaction 
strength r\ are shown: 0.0 (black), 0.5 (red) and 1.0 (blue). 
Solvent-accessible surface areas (SASAs) were calculated us- 
ing VMD, 61 with a particle radius of 1.97 A, a solvent radius 
of 1.4 A, and 1,000,000 samples per atom. The bulk liquid- 
vapor surface tension of water (horizontal red dots) is shown. 
Inset: Detail of the hexagonal plate. The solvent-excluded 
volume of each oily site is a sphere of radius Ro = 3.37 A. 



equivalent hard sphere, whose radius R$ is estimated ac- 
cording to 



Rn — 



B. Fluctuations 

A more detailed probe of solvent behavior than sol- 
vation free energies is the probability Py(N) of find- 
ing TV waters in a given volume V. The solvation free 
energy G of an ideal, volume-excluding hydrophobe is 
simply 33 f3G = — lnPy(0), and we can glean information 
about hydrophobicity and dewetting from the behavior 
at non-zero N. 

In the present model, we estimate Pv{N) by a two- 
step procedure. For any given solvent configuration {rij}, 
the small length-scale fluctuations of Sp(r) give rise to a 
Gaussian distribution in the numbers of waters, so that 



P v (N\{m}) cx exp[-(N - (N) v ) 2 /2a v ], 



(16) 



where, 



(N) v =J2 ni - ^2xij(V,v) nj (N) v /a v 

i 3 

~ Xl ^ V > V^jPiUj + fa) 



and, 



(TV 



'Y^n i Xij{V,V)n j 



(17) 



(18) 



dr[l - exp(-/3u (r))], 



Here, Vi is the overlap of the probe volume with cell i. 
Notice the use of the probe volume V in the Xij matrices. 
Formally, we then thermally average the above result over 
all possible solvent configurations to obtain 

P V (N) cx Py(A r |{nJ)exp(-/3 J ff cff [{n J }]). 

{»*} 

In practice, we estimate this sum by sampling a lattice 
variable n that closely correlates with N, given by 



which is a first approximation to the WCA value of this 
radius, 65 ' 66 and is essentially the radius at which Uo(r) 
is IcbT. The excluded volume is then the union of the 
hard-sphere volumes of each solute site. 

Figure 3 compares the solvation free energies for this 
family of solute plates computed from our atomistic sim- 
ulations with those computed from the coarse-grained 
model with the unbalancing parameter a determined 
above for solvated hard spheres. Now, with this different 
geometry, the coarse-grained model continues to perform 
well. The discrepancies are primarily due to the small 
underestimation, shown in Figure 2, of the solvation free 
energy of small spheres. Figure 3 also compares the sol- 
vation free energies of plates with increasing attractions 
to the corresponding results from explicit-water simu- 
lations. Aside from the small artifacts already present 
in the ideal solute case, the contribution of the attrac- 
tions to solvation free energies calculated with the coarse 
grained model is nearly quantitative. 



We divide the range of possible values of n into small 
overlapping windows, and sample relevant configura- 
tions at every value of n using Wang-Landau sampling 67 
along n, together with replica exchange, 68 to obtain good 
sampling and avoid kinetic traps. We then used the 
multistate Bennet acceptance ratio estimator 69 (MBAR) 
to reconstruct from these runs the probability distribu- 
tion P(n). During the umbrella sampling runs, lattice 
configurations with equal n are observed in proportion 
to their Boltzmann weight. Using the notation {rii} G n 
to denote all observed lattice gas configurations with a 
particular value of n, we finally obtain 



P v (N)=J2P(n) 



E 

{ni}£n 



Pv(N\{m}). 



To estimate the statistical errors in our procedure, we 
calculate Py(N) in five independent Monte Carlo runs, 
and estimate the standard error in the mean of In Py (N). 
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FIG. 4. Water number distribution in a 12 x 12 x 12 A 3 , 
as obtained using explicit SPC/E water simulations, 32 the 
present model, the present model without the unbalancing 
potential (a = 0), and a model with an Ising Lattice Gas and 
no unbalancing potential. 



For comparison, we also calculate these distribu- 
tions in SPC/E water using LAMMPS 70 as described 
previously, 32 paying careful attention to good sampling 
around free energy barriers. Errors were estimated with 
MBAR. 

In the absence of a solute, Py(N) is sensitive only to 
the interfacial energetics of the lattice gas. Figure 4 com- 
pares the Py (N) curve obtained using the present model 
for a 12 x 12 x 12 A 3 volume with results that we have 
previously obtained from simulation of SPC/E water, 32 
and with (a) a version of the coarse-grained model that 
lacks an unbalacing potential (a is set to zero), and (b) a 
version that additionally uses the naive Ising lattice gas 
for estimating interfacial energetics in n(r). Our present 
model captures the observed deviations from Gaussian 
behavior better than these simpler models, which reflects 
its higher accuracy in estimating interfacial energetics 
and microscopic curvature effects. 

We have also previously examined how hydrophobic 
solutes affect water number fluctuations in nearby probe 
volumes. 32 To evaluate the performance of our model in 
that scenario, we use the model hydrophobic plate so- 
lute described in Ref. 32. The plate is made up of oily 
particles with the same number density as water, whose 
centers lie inside a 24 x 24 x 3 A 3 volume 71 . Taking into 
account the van der Waals radii of the oily particles, the 
plate has approximate dimensions 28 x 28 x 7 A 3 . We 
model this solute in the same way as the hexagonal plates 
described above. As before, we explore the role of at- 
tractive interactions by varying the attraction strength 
parameter rj. 

Figure 5 shows the water number distribution in a 
24 x 24 x 3 A 3 probe volume adjacent to the plate. With 




Number of waters N 

FIG. 5. Water number distributions in a probe volume of size 
24 x 24 x 3 A 3 immediately adjacent to a model plate solute 
(inset) of varying attractive strength rj, in the coarse-grained 
model (solid lines) and in explicit SPC/E water (points). 
Defining z = to be the plane passing through the plate cen- 
ter, points in the probe volume (green) satisfy 5 A < z < 8 A, 
so that a water molecule touching the plate is located at the 
edge of the probe volume. 32 

no solute-solvent attractive interactions, the probabil- 
ity computed from the lattice model has a clear fat tail 
towards lower numbers of waters in the probe volume. 
This fat tail is the hallmark of a soft vapor-liquid inter- 
face, in this case a soft interface next to the hydropho- 
bic solute. 72-75 At higher attractive interactions, this fat 
tail is correspondingly depressed, but not entirely sup- 
pressed. Accordingly, in Ref. 32, the fat tail is only fully 
suppressed when rj exceeds 3.0. 

Figure 5 also evidences some of the limitations of the 
present model. The probe volume being less than one 
lattice cell thick, large lattice artifacts are inevitable. 
Moreover, since Py(iV) distributions are much more de- 
tailed probes of solvent structure than solvation free ener- 
gies, we expect more room for disagreement with simula- 
tion. Nevertheless, we emphasize that, by construction, 
no implicit solvation models can capture the above ef- 
fects on solvent structure, which underlie the pathways 
of hydrophobic assembly. Other coarse-grained solvation 
models (for example, see Ref. 76), on the other hand, can 
probe rare solvent fluctuations, and it would be useful to 
evaluate their accuracy in this respect as compared to 
explicit-water models and the present lattice model. 



C. Confinement 

To examine confinement in detail, we place two of the 
model hydrophobic plates at a distance d from each other, 
as shown in Figure 6, and calculate the water number 
distribution in a 24 x 24 x [d — 3) A probe volume be- 
tween them as a function of interplate separation d and 
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FIG. 6. Setup for examining water fluctuations under confine- 
ment (here, d — 8 A). The model hydrophobic plates 32 (grey 
particles) are placed dA apart: taking into account the van 
der Waals radii of about 2 A of the plates' oily particles and 
the 3 A thickness of each plate, the center of the first plate 
is placed at z = 0, the center of the second plate is placed 
at 2 = d + 7 A. The van der Waals radius of water (red and 
white sticks) being about 1.5 A, the 24 x 24 x (d — 3) A 3 probe 
volume (green) extends from z = 5 A to 2 = d ± 2 A. The 
plates are not perfectly flat, so some waters fit between the 
plates and the probe volume. 
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FIG. 7. Phase diagram for the interplate region of the system 
depicted in Figure 6. For the explicit SPC/E water simula- 
tions (blue), each symbol corresponds to an individual Pv(N) 
distribution that we have calculated (filled: wet state stable; 
open: dry state stable) . The phase boundary (blue dashes) is 
estimated from a linear interpolation of the relative stability 
of the wet and dry states. The relative stability is deter- 
mined from the relative depths of the basins in — In Pv (N) 
The phase boundary for the present model (red solid line) was 
estimated from a dense sampling of Py(N) distributions, and 
is accurate to ±0.1 A in d and ±0.1 in r\. 



tions that we observe. 36. 



attraction strength r\. Figure 7 summarizes the results 
in the form of a phase diagram. At small separations 
and low attractive strengths, the dry state (low N) is 
most stable, whereas high attractive strengths and large 
separations stabilize the wet state (high N). Generi- 
cally, the hydrophobic association of two such plates pro- 
ceeds through a dewetting transition in the inter-plate 
volume. 22-24,26 

The general, though not quantitative, agreement be- 
tween the coarse-grained model and the SPC/E data is 
very encouraging: bistability is observed in the Py(N) 
distributions in both cases, with the barriers at the nearly 
equal values of N, and with barrier heights that track the 
SPC/E barrier heights. The phase boundary in Figure 7 
closely tracks the phase boundary observed in explicit 
water, with a shift of less than 2 A for all r\. Moreover, as 
shown in Figure 8, once the general shift in the phase 
boundaries is accounted for, the Py(N) distributions 
for systems near that boundary obtained by the coarse- 
grained model and the SPC/E simulations agree reason- 
ably well. Hence, the present model is better suited than 
implicit solvation models for studies of nanoscale self- 
assembly or protein-protein interactions driven by the 
hydrophobic effect. A recently developed coarse-grained 
model of water (mW water) has been used to extensively 
probe these rare fluctuations, and their predictions also 
display the characteristic bistability of dewetting transi- 



IV. DISCUSSION 

We have presented a coarse-grained model of solvation 
thermodynamics that correctly reproduces the length- 
scale dependence of solvation free energies, and, more- 
over, correctly captures the behavior of the slow and rare 
solvent fluctuations that are pivotal in pathways to hy- 
drophobic assembly. Our model is applicable to generic 
solute shapes, and addresses the effects of solute-solvent 
attractive interactions. 

While our model successfully describes various aspects 
of the hydrophobic effect, several technical challenges 
must be addressed before it can be applied in biologi- 
cal settings. Most notably, electrostatic forces are miss- 
ing from our model. As a first approximation, the GB 
treatment may well be sufficiently accurate, as long as 
the low-permittivity cavity includes both the solute's ex- 
cluded volume and the regions where n(r) is zero. It 
may also be possible to implement electrostatics in terms 
of a dipole density field coupled to the water density 
field. The statistics of the dipole field are known to be 
Gaussian 78,79 so that their contribution to -ff G ff[{fii}] may 
be computed analytically. 

A second notable technical hurdle is to find efficient 
algorithms for calculating the gradient of -ff c ff[{«j}] with 
respect to the position of the solute's atomic centers, nec- 
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FIG. 8. Water density distribution of confined water 1 A from 
coexistence. These distributions are for the system depicted 
in Figure 6 when r\ = 0.5. Coexistence lines are shown in Fig- 
ure 7. The explicit water simulation data (black) corresponds 
to d = 11 A, while the coarse-grained model (red) results cor- 
respond to d — 9 A. The remaining Pv(N) distributions are 
included in the Supplementary Data 71 . 



essary for implementing realistic solute dynamics, such 
as Brownian dynamics. In the context of solvent lattice 
models, the problem is tractable for spherical solutes with 
limited overlap, 14,45 but the implementation of a solution 
for generic solutes is more challenging. 

Finally, as with implicit solvation models, our own 
model does not attempt to capture solvent dynamics. For 
thermodynamically-driven processes, almost any reason- 
able dynamics may suffice when estimating the kinetic 
prefactor of rate constants of interest. Indeed, in a pre- 
vious lattice model, 14 the solvent dynamics is approxi- 
mated by Glauber dynamics, the solute's by Langevin 
dynamics, and the relative rates at which the two dynam- 
ics are advanced are calibrated through physically rea- 
sonable arguments. However, it is known, as evidenced 
in the form of the Oseen tensor, that hydrodynamic 
interactions can be long-ranged 80 and can influence 
timescales of molecular processes by one or more orders 
of magnitude. 81 This observation may prove important in 
nanoscale assembly processes that are kinetically driven, 
rather than thermodynamically driven. 82 Approaches to 
implementing coarse-grained dynamics in a lattice set- 
ting include multiparticle collision dynamics, 83 fluctuat- 
ing hydrodynamics 84 and lattice Boltzmann methods, 85 
among others. We leave all dynamical considerations to 
future work. 
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Appendix A: Derivation of the Model 
1. Continuum formulation 

In this appendix, we derive Equation (11) starting from 
the microscopic ideas of LCW theory embodied in Equa- 
tion (2). The forms of i/ sma n[<5p(r); n(r)] and Fi arge [n(r)] 
are given in Equations (3) and (9), and are discussed in 
the main text. Here, we begin by discussing the details 
of the i?i n t[n(r), Sp(r)] term. Next, we integrate out the 
field Sp(r) to obtain the effective Hamiltonian H c ff[n(r)]. 
In the following section, we discretize it. 

Whenever n(r) is non-uniform, solvent molecules ex- 
perience an effective attraction towards denser regions, 
or equivalently, an effective repulsion from less dense re- 
gions. As argued by Weeks and coworkers, 12 ' 13,46 this 
effect can be modeled as a coupling, H int [n(r), 5p(r)], be- 
tween an external unbalancing potential, 0(r), and sol- 
vent density. In the absence of a solute, the energetics of 
this effect is completely contained in f/i arge [n(r)], but the 
presence of a solute gives rise to important corrections. 
Formally, the coupling is given by 

H int [n(r),Sp(r)} = J^(r)6p(r) + H noxm [n(r)}, (Al) 

where 



(r) = -2ap e [n{r) - 1]. 



(A2) 



Here, a determines the strength of the potential, and the 
overbar operator smears n(r) over the effective range of 
solvent-solvent attractive interactions. The potential is 
shifted so that it is zero for the uniform liquid. The term 
#norm[«(r)] is chosen so that, in the absence of a solute, 
H e g[n(r)] is identical to £zi arge [n(r)]. 

We now integrate out the field Sp(r) to ob- 
tain H Q g[n(r)]. For notational simplicity, we suppress the 
dependence of %(r,r') on n(r) manifest in Equation (4). 
The total density ptn(r) + Sp(r) is constrained to be zero 
for all points r in v, so the effective Hamiltonian is given 
by 

exp{-/3H eS [n(r)}} = J V6p{r) 

exp{-pH[n{r), 6p(r)}} ]J 5( Pe n(r) + 6p(r)). (A3) 

A long but straightforward calculation 42 yields 
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argc [Tl (r)] + k B T In ^/dct 2ttx v - / ^(r)^n(r) 



+ 



Pe n(r)- x (r,r")W) 



pen(r') 



x(r',r'")/W) 



r'" £v 



f f /30(r) X (r,r')Mr') + Hno rm [n(r)]. (A4) 



Here, x«( r : r ') i s the restriction of x(r,r') to the vol- 
ume v. As such, x« 1 ( r : r ') satisfies 

/ x^^'Mr',!-") = <5(r-r"), r, r" e v. (A5) 

To make ff c ff[n(r)] equal to iJi arge [n(r)] in the absence of 
a solute, 



An 



[n(r)] = 



W(r) X (r,r')»(r'). (A6) 



It is useful to recast Equation (A4) into a form where 
the physical significance of each term is manifest. To do 
so, we first note how the constraint of zero solvent den- 
sity inside v modifies the solvent density and its fluctua- 
tion spectrum outside of v. As described in Ref. 42, the 
average of Sp(r)Sp(r') in the presence of the constraint, 
denoted by x' m K r ; r '); i s given by 



X (m) (r,r')-X(r,r') 

- / / X(r,r")x„- 1 (r",r"')x(r"',r'). (A7) 

Jr"Ev Jr"'Ev 

From Equation (A5), it follows that x^( r > r ') 1S zer0 
whenever r or r' are in v, as required by the solvent ex- 
clusion constraint. To describe the constraint's effect on 
the average density, we introduce an auxiliary field c(r) 
that satisfies 



X(r, r')c(r') = p e n(r), rev, (A8a) 
c(r) =0, rev. (A8b) 



v' Ev 



In terms of c(r) and x' m \ the average density in the 
presence of the solute is given by 

(p(r)) = p e n(r) 

-f x(r,r')c(r')- / X ,m) (r,r')»(r'). (A9) 

J r' Ev J r' Ev 

Equation (A4) can now be written much more simply 



r 



as follows. 



H eS [n(r)} = i?i arge [n(r)] - / (f>(r)p e n(r) 

+ k B T In y/det 2tt Xv + f pen{r)c(r) 



rGf J r' £v 



+ 



r)x(r,r')c(r') 
»(r)( X -x W )(r,r')W(r'). (A10) 



For the geometries we considered, the sum of the last two 
terms of this equation is, on average, opposite in sign 
but nearly proportional to the much simpler remaining 
term involving (j)(r) (see Section D). Physically, these 
three terms capture the energetic bonus of driving Sp(r) 
to inside v where </> is positive, the energetic cost of 
the consequent density enhancement just outside of v, 
and the small difference between (a) the entropic cost 
associated with <f) modifying the solvent density in the 
presence of a solute and (b) that same cost in the absence 
of a solute. In typical configurations, the three terms are 
roughly proportional to the subvolume of v where n(r) = 
1 , and capture how solvation free energies are modified by 
the microscopic curvature of v. We have found it accurate 
to model the effect of these three terms using only the 
second term of Equation (A10), whose strength is then 
renormalized by a factor K. The resulting approximation 
for # int [n(r)] is 



#int[n(r)] « -K / cj>(v)p e n(r) 



(All) 



Finally, we introduce an important simplification in 
i?smaii[n(r)]. Instead of solving Equation (A8a) to obtain 
the value of the field c(r) in v, we replace c(r) there by its 
average value, ci, and obtain the much simpler relation 



c\ = (N) v /a v , 



where 



(N) v 



Pen(r), 



Ov = \ \ x(r,r'). 



(A12) 

(A13) 
(A14) 
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Equivalently, in Equation (A3) we enforce the single con- 
straint that f pin(r) + 5p(v) be zero, instead of enforc- 
ing the multitude of constraints that pen(r) + 5p(r) be 
zero at every point r in d. We have verified that this ap- 
proximation, dubbed the "one-basis set approximation" 
in previous works, 13,52 does not appreciably change the 
solvation free energies and Pv(N) distributions that we 
have obtained. Crucially, this approximation replaces the 
large (though sparse) linear system of Equation (A8a) 
with the trivial relation of Equation (A12), and is there- 
fore very advantageous computationally. With it, the 
term H sma . n [n(r)] is given by 



ffsmaiiKr)] = k B T[(N) 2 v /2a v + C/2], 



(A15) 



The normalization constant C is defined by Equa- 
tion (He). When the value of (N) v becomes small, the 
integral defining a v is dominated by the (5-function in 
Equation (5) and takes the value u v w (N) v . The value 
ln(27r<7„) of C that is applicable for larger (N) v thus 
tends unphysically to negative infinity as (N) v tends to 
zero. This deficiency arises from a breakdown of Gaus- 
sian statistics for solvent number fluctuations in sub- 
Angstrom volumes. Since solvent molecules are discrete 
entities, these statistics are instead Poissonian. A small 
cavity v can contain either one solvent molecule, with 
probability (N) v , or no solvent molecules, with proba- 
bility 1 — (N) v . The free-energy cost of evacuating that 
cavity is thus -fc B Tln(l - (N) v ) « (N) v k B T. The def- 
inition of C given in Equation (lie) is a simple, con- 
tinuous way of capturing this difference in fluctuation 
statistics at tiny length-scales. The crossover occurs at 
(N) v w (2tt - 2)" 1 w 0.23. 



2. Lattice formulation 

Using Equation (6), we express H e fi[n(r)] and its com- 
ponent terms in terms of the lattice variables rij, so that 

H cB [{ni}} = Hi algc [{m}} + H int [{ni}} + H sma ii[{m}}. 

(A16) 

The integrals that define each term are then approxi- 
mated through lattice sums, with continuous fields re- 
placed by either their average values or their integrals 
over each cell. 

Equation (A15) for H Bma n[5p(r);n(r)] is the easiest to 
tackle. We begin by discretizing Equation (4), which 
defines x(r, r'), when the domains of integration for 
r and r' are V and V', respectively. In terms of the ma- 
trix Xij{V> V) defined by Equation (8), our prescription 
yields 



X(r,r') ->■ niXij(V,V')nj 



r e V, r' e V. 



Equation (lid) for <j v then follows immediately from 
Equation (A14). Equation (11c) for (N) v reasonably ap- 
proximates the integral in Equation (A13). 



To discretize Equation (All) for 7?i nt [n(r)], we need 
to choose a concrete implementation of the overbar op- 
eration that is used to define far). Following Ref. 13, we 
approximate it as a weighted average involving the cell 
and its nearest neighbors 86 , given by 



n(r) -> 



1 1 



E 



3 ( r 



The average, fa, of far) over cell i follows immediately 
from Equation (A2), and is given by Equation (lib). Fol- 
lowing our prescription, Equation (All) is then reason- 
ably discretized as the lattice sum 

H int [{n t }} w K^2fa(-pemvi). 



Discretizing Hi argc [n(r)} correctly is a surprisingly sub- 
tle challenge. Previously, 13 ' 14,45 it has been approxi- 
mated it by an Ising Hamiltonian with nearest-neighbor 
coupling 



H, 



large 



KM] 



7 A 2 ^(n, - nj f - Lip e \ 3 '< 

(ij) 1 



Unfortunately, the use of this Hamiltonian results in se- 
rious artifacts. Consider, for instance, the energetics of 
a convex vapor bubble embedded in the liquid, as rep- 
resented by the field {rii}. Many configurations of the 
field that are physically distinct have nonetheless equal 
projections onto the xy-, yz- and xz-planes, so they will 
be given equal statistical weight by the Hamiltonian. 
Hence, the use of this Hamiltonian results in an unphys- 
ical excess of entropy, as shown in detail in Section E. 
Moreover, the energetic cost of common configurations of 
the field {rii} is substantially overestimated. The Ising 
Hamiltonian assigns a large vapor bubble of radius R an 
interfacial energy of about 6irjR 2 , not 4irjR 2 . Whereas 
using a rcnormalizcd 7 can alleviate this latter problem, 27 
the problem of excess entropy is more fundamental. 

Motivated by the above deficiencies of the Ising Hamil- 
tonian, we have instead chosen to evaluate the Landau- 
Ginzburg integral in Equation (9) numerically. To pro- 
ceed, we need to construct the basis function ^(r) used in 
Equation (6). Our choice, depicted in Figure 9 for water, 
approximates the usual van der Waals construction 87 at 
a local level. We first construct a ID basis function ^(a;) 
satisfying 



w'(ip{x),0)-mil)"(x) =0, 



(A17) 



with boundary conditions ip(0) = 1 and ip{X) = 0. We 
then extend the range of ip(x) and symmetrize it so that 



and 



ip(x > A) = 0, 



ip(x < 0) = ip{-x). 



(A18) 



(A19) 
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FIG. 9. Constructing n(r) from {rii}. The binary field speci- 
fies whether the density at the center of each lattice cell should 
be that of the liquid or that of the vapor. Between cell cen- 
ters, the density is interpolated using the basis function ip{x) 
(whose form for water is shown in the lower left panel). The 
dashed lines delineate the domain of integration of the local 
free energy hi given by Equation (10). 



Finally, the three-dimensional basis function 4 r (r) is con- 
structed from the one-dimensional profiles 4>(x) to give 

V(x,y,z) = i/}(x)il>(y)ip(z). 

The field n(r) constructed from Equation (6) using this 
basis function has many useful properties: the value of 
n(r) at the center of each cell % corresponds to the state 
encoded in nf, the density interpolates smoothly between 
adjacent cells; and the density profile of a configuration 
representing an axis-aligned wall, where all rij's are 1 on 
one side of a plane and on the other, nearly reproduces 
the interface profile given by the van der Waals construc- 
tion. 

For water, we use the function w(n, (i) given in Equa- 
tion (14). This choice results in both sides of Equa- 
tion (A17) being proportional to to, so the function ip(x) 
is independent of to. In the free van der Waals theory, 
where the boundary conditions on Equation (A17) are 
ip(-oo) = 1 and V(+°o) = 0, the density profile ipo{z) 
that results is 

M z ) = [1 +tanh(z/rf)]/2, 

which accurately describes the average density profile of 
an SPC/E water slab at ambient conditions. The thick- 
ness parameter d can thus be determined from simula- 
tion. A complication due to capillary waves is that d 



grows logarithmically with simulation box size, 72 ' 88 so 
different authors quote different values of d: 1.27 A for a 
19 x 19 A 2 interface in Ref. 52 and 1.54 A for a 30 x 30 A 2 
interface in Ref. 89. We choose the smaller value be- 
cause the instantaneous configuration of n(r) should be 
blurred only by small-scale fluctuations, not by large- 
scale capillary waves, which correspond instead to differ- 
ent conformations of n(r). The profile shown in Figure 9 
corresponds to the solution of Equation (A17) when the 
more restrictive boundary conditions described above are 
imposed, with A = 4 A and d = 1.27 A. 

With concrete choices of \&(r), w(n,0), to and A, the 
integrals hi defined by Equation (10) can be evaluated. 
We discuss the choice of to below. As outlined in the 
main text, the value of hi depends only on the values of 
rij for the 8 cells j that share one of the corners of cell i. 
Out of the 256 possible configurations of {nj}, only 14 
are unique when one accounts for reflection, rotation and 
inversion symmetry Thus, only 14 distinct integrals need 
to be evaluated numerically. This decomposition bears 
a strong resemblance to the marching cubes algorithm 90 
that reconstructs interfaces in volumetric data, and is 
widely used in computerized tomography. 

In principle, the value of m is related to the surface 
tension by the relation 87 
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Jo 



w{i/>(x),0) + mij)'{xf/2 



dx. 



(A20) 



On a lattice, as exemplified above by the Ising Hamilto- 
nian, this choice results in perfect interfacial energies for 
flat axis- aligned interfaces at the expense of more com- 
mon curved interfaces. Thus, we instead choose to self- 
consistenly such that ip(x) satisfies Equation (A17) and 
the calculated interfacial energy of some reference geom- 
etry of surface area A is 7 A Equation (A20) corresponds 
to a cubic reference geometry. Since curved surfaces are 
far more common than flat one in realistic solutes, we 
instead use large spheres as our reference geometry. 

For the specific form of w(n) that we use for water, hi 
is proportional to m and ip(x) is itself independent of to. 
The above self-consistent procedure can hence be imple- 
mented quite simply. We first calculate the hi quantities 
up to a factor of to, and then pick to to obtain the correct 
interfacial energies. The resulting values of hi are given 
in Table I. 



3. Incorporating solute-solvent interactions 

A generic solute interacts with a solvent molecule 
through a potential u(r). This interaction is reflected 
in the microscopic Hamiltonian of Equation (2) as an 
additional term H u [n(r),Sp(r)] given by 

H u [n(r),Sp(r)} = |«(r)[p<n(r) + Sp(r)]. 

Upon integrating out the density fluctuations, an addi- 
tional term H u [n(r)] appears in H e s[n(r)]. Physically, 
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TABLE I. Relative interfacial free energy hi for each distinct neighboring cell configuration (diagrams after Ref. 90). Highlighted 
corners denote cells j with rij = 1, whereas the others refer to cells with rij = 0; cell i is the lower- left corner in the back. To 
aid the eye, a schematic of the implied liquid-vapor interface of each configuration is shown in orange. The values of hi are 
inversion-symmetric: interchanging highlighted and unhighlighted corners yields the same interface and interfacial energy. Also 
shown are the values of hi that would reproduce the energetics of the standard Ising lattice gas, namely 7A 2 Y2uj\ ( ni ~ n i)' 2 ■ 



Local {rij} configuration 




hi (Present model) 0.000 0.387 0.676 0.725 0.754 0.851 0.965 

hi (Ising model) 0.000 0.750 1.000 1.500 1.500 1.250 1.750 



Local {rij} configuration 



$tv ior ^ \e? 



hi (Present model) 0.983 0.857 0.910 1.104 0.965 1.040 1.134 

hi (Ising model) 2.250 1.000 1.500 2.000 1.500 2.000 3.000 



the total solvent density responds linearly to the external 
field u(r) according to the density fluctuation spectrum 
is given by x^ m H r > r ')> so that 



(p(r)) = p e n(r) - / x(r, r')c(r') 

X {m Hr,r')mr') + u(r% (A21) 



The resulting free energy change H u [n(r)] arises from the 
direct interaction of the solute and the solvent, and from 
the entropic cost of modifying the mean solvent density 
around the solute. It is given by 

H u [n(r)} = J dru(r)(p(r)> 

+ ^^/«(r)x (m) (r,r')/? U (r'). (A22) 

Note that the integrands are zero whenever r or r' are 
inside the solute. 

To implement the previous equation on a lattice, we 
have found it useful to approximate X ( r i r ') by 

X M( P r '\ ~ jxo(r-r'), r, r' e v,n{r) = n(r') = 1, 
' ]0, otherwise. 

(A23) 

We also use the one-basis set approximation, c(r) w ci, 
given in Equation (A12). Discretizing Equation (A22) 
as in the previous section then immediately yields Equa- 
tion (13). 



The terms involving the delta-functions of Equation (4) 
are trivial. Owing to the rapid oscillations in g(r) — l, the 
remaining integrals are harder to estimate. We employ a 
two-step procedure to estimate these integrals efficiently. 
We begin by subdiving the A = 4 A-resolution grid of 
cells into a much finer grid of resolution A/ = lA. For 
clarity, below we explicitly distinguish between cells in 
the coarse grid, indexed by the letters i and j, and cells in 
the fine grid, indexed by the letters a and b. We evaluate 
the integrals of the non-delta-function portion of xo on 
the fine grid without otherwise restricting the arguments 
to particular volumes V and V , and denote the result 
by Xab- Each fine cell is so small that the effect of a 
restriction on the integration domain can be estimated 
accurately with a simple interpolation formula. We then 
use these interpolated values in the fine grid to build up 
the elements of Xij(Vi V) over the coarse grid. 

To evaluate Xab, we use the Narten-Levy data for the 
structure factor S(k) of water. 54 Since the S(k) is un- 
available for wave-numbers k higher than 16 A -1 , we blur 
the domains of integration over a range of about 2tt/16A, 
which makes the values of the integrals practically insen- 
sitive to this missing data. Concretely, we introduce a 
basis function $, given by 



with 



tanh tanh — 



Appendix B: Estimating Xij(V,V') 

An essential ingredient of the model we present is the 
matrix Xij(^ V'), given by the integral in Equation (8). 



The function <p is unity around x = 0, and goes rapidly 
to zero as \x\ > A//2, with A controlling the range of x 
over which this transition occurs. We have found a value 
of 0.1 A for A to be adequate. Using the notation r a to 
denote the center of fine cell a, the value of Xab is given 
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by 

Xab = fij j #(r - v a )[g{\v - r'|) - l]$(r' - r 6 ). (Bl) 

The integral is best evaluated in Fourier space, where 
the term in square brackets appears as the experimen- 
tal S(k) profile. We overcome the convergence problems 
of a rapidly oscillating integrand by using the Haselgrove- 
Conroy integration algorithm. 91,92 To properly account 
for g(r) being exactly zero for r < 2.35 A, we further set 
Xab to exactly — pj if all points in a are within r c = 2.35 A 
from all points in b. To limit the range of Xab, we also 
set it to zero if all points in a are more than 10 A from 
all point in b. The values of Xab need only be calculated 
once at each state point of water, and we have spent con- 
siderable effort in compiling them at ambient conditions. 
Our results are included in the Supplementary Data 71 . 

For specific volumes V and V, we estimate the value 
of Xij {V 1 V) as a weighted average of the pertinent values 

of Xab, 

Xij(v,v') « pt(vnV) + EE^/ A /)wW/A}), 

aei bej 

(B2) 

where (V D V) is the volume of the overlap between 
V and V. This interpolation formula for Xij is manifestly 
linear in its arguments, so that 

Xij (V, V) + Xij (V, V") = Xij (V, V U V"), 

whenever V' and V" do not overlap. Most importantly, 
the interpolation procedure is simple, convenient, and 
correct for the limiting cases of where all the values of V a 
are either or X 3 j. 

For comparison, we have also calculated values of Xab 
from an explicit SPC/E water simulation in GROMACS 
at temperature T — 298 K and pressure p = 1 atm. The 
values are also included in the Supplementary Data 71 . 
For the quantities we have studied in the main text, using 
these values for Xab instead of those derived from the 
Narten-Levy data yields nearly identical results. 



Appendix C: Fluctuation variance 

The variance of the field Sp(r) given in Equation (4) is 
a simplification of the LCW interpolation formula, 

XLCw(r,r') = p e n(r)6(r-r')+pjn(r)[g(\r-r'\)-l]n(r'), 

to the case where n(r) only takes the values or 1. 
The discrepancies arising from using Equation (4) and 
more precise expressions for the variance are mostly 
quantitative and limited to the vicinity of liquid-vapor 
interfaces. 44 

One possible improvement to Equation (4) is given in 



Rcf. 42: 

X(r,r') = xo(r,r')- 

/ dr" / dr'" X o(r,r")x £ 1 (r",r"')xo(r"',r'), 

Jr"GE Jr"'eE 

(CI) 

where E is the empty (i.e., gaseous) region of space where 
n(r) is 0, and Xe~{Fi r ') satisfies 

/ X B 1 (r,r")xo(r",r')=5(r-r'), r, r' 6 E. 

Jr"eE 

Equations (CI) and (4) are in qualitatively agreement: 
both are zero when r or r' are in the gaseous region, 
and both reduce to xo(r,r') well into the liquid phase. 
The differences are, as expected, concentrated near the 
boundaries of E. In this refined expression, the integrand 
oscillates significantly within a lattice cell, so a lattice 
approximation to Equation (CI) proves unreliable. Be- 
cause using Equation (4) gives accurate results for all 
quantities we have examined, we regard the approximate 
Equation (4) to be acceptable, and we have not pursued 
algorithms by which Equation (CI) can be accurately 
evaluated. 



Appendix D: How well is the effect of unbalanced forces 
captured by Equation (All)? 

Above, we replaced the three terms involving fa in 
Equation (A10) by the simpler expression given in (All). 
We now justify this replacement. 

Denote by H + [n(r)] the terms dropped from Equa- 
tion (A10). They are 

H + [n(r)}=-[ [ 0(r)x(r,r')c(r') 

+ ^fff mr){x x(m))(r ' r ' )/W) - (D1) 

Using the approximation for x^ given in Equa- 
tion (A23) and the one-basis set approximation of Equa- 
tion (A12), we discretize these terms to obtain a lattice 
version of H + [n(r)], 

H +[{ n i\] = - ^2 l 4>in i Xij{v,v)n j {N) v /a v 

i,j 

+ k B T^2 /3(j) l n i [xij(v,v)/2 + Xij{v,v)]njp<f)j. 

i,3 

Because of the double sums in the formula, calculating 
i/ + [{rii}] is by far the most computationally-demanding 
part of calculating iJ e fj[{nj}]. Since a single cell flip 
changes the value of fa in up to 7 cells, calculating incre- 
mental changes to iJ + [{nj}] is also much more expensive 
than calculating incremental changes to (Equa- 
tion (13)), which has a similar structure. 
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Sphere radius R (nm) 



FIG. 10. Solvation free energies G of hard spheres as a 
function of sphere radius, where the term H+[{rn}] (Equa- 
tion (Dl)) has been included and the renormalization con- 
stant K has been set to 1 (solid black), compared to the 
simpler model in Equation (11) (circles). The averages of 
— {Hi n t[{rii}]} (red) and {H + [{rii}]} (black) are nearly propor- 
tional to each other. Left Inset: implied renormalization con- 
stant K, equal to (ffi nt [{ru}] + H+Kn,}]}/ (H int [{ni}]}. Note 
that both the numerator and denominator take on essentially 
zero value for 71 < 0.4 nm. Right Inset: Implied value of K for 
hexagonal plate solute (Figure 3) with r/ = 1.0. The implied 
value of K is similar for different r). 



Figure 10 presents the solvation free energies of hard 
spheres calculated when the i/ + [{rii}] term is included 
and the renormalization constant K is set to 1. As can 
be seen, the term corresponding to i?i n t[{^i}] has a much 
larger absolute value, and in the region where their values 
are not negligible, the average values of Hi n t[{«i}] and 
H+[{rii}] are, as claimed, essentially proportional. The 
renormalization procedure we implement thus seems jus- 
tified, a conclusion borne out by the results in the text. 
For completeness, we have verified that the solvation free 
energies of the hexagonal plate solute (Figure 3) calcu- 
lated when H + [{n^}] is included and K isl are essentially 
identical to the ones calculated using Equation (11). 



Appendix E: Comparison to the model of ten Wolde and 
Chandler 

Above, we argued that the Ising Hamiltonian estimate 
for iJi arge [n(r)] overestimates the interfacial energy of a 
sphere of radius R by a factor of 3/2. However, the lat- 
tice version of LCW theory presented by ten Wolde and 
Chandler 14 uses precisely this Hamiltonian, yet the sol- 
vation free energy of spheres seems to tend to the correct 
value as R grows. Here we explain this apparent paradox. 

Figure 1 1 shows the solvation free energies of spheres in 
the model of Ref. 14, and shows how they differ when the 
lattice cell size of A = 2.1 A is changed to A = 2.3 A. As 
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Sphere radius R (nm) 

FIG. 11. Solvation free energies G of spheres in the model of 
Ref. 14 (black), for cell sizes A = 2.1 A (solid) and A = 2.3 A 
(dashes). The use of the Ising Hamiltonian causes the average 
value of i/iargcK^i}] (red) to significantly exceed the solvation 
free energy, but also leads to large excess entropies (blue, 
TS = (H) - G). At A = 2.1 A, but not at A = 2.3 A, a 
fortuitous cancellation leads to correct solvation free energies. 

claimed, (-ffi arg0 [{ni}]) is much larger than it should be, 
but for A = 2.1 A, the excess entropy resulting from the 
unphysical degeneracies of the Ising Hamiltonian exactly 
cancels this excess energy. This fortuitous cancellation 
does not occur for different cell sizes, and will not, in 
general, hold for solutes of different geometries. 
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